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A General Theory of Additive State Space Abstractions 



Informally, a set of abstractions of a state space S is additive if the distance between 
any two states in S is always greater than or equal to the sum of the corresponding dis- 
tances in the abstract spaces. The first known additive abstractions, called disjoint pattern 
databases, were experimentally demonstrated to produce state of the art performance on 
certain state spaces. However, previous applications were restricted to state spaces with 
special properties, which prcchides disjoint pattern databases from being defined for sev- 
eral commonly used testbeds, such as Rubik's Cube, TopSpin and the Pancake puzzle. In 
this paper we give a general definition of additive abstractions that can be applied to any 
state space and prove that heuristics based on additive abstractions arc consistent as well 
as admissible. We use this new definition to create additive abstractions for these testbeds 
and show experimentally that well chosen additive abstractions can reduce search time 
substantially for the (18,4)-TopSpin puzzle and by three orders of magnitude over state of 
the art methods for the 17-Pancake puzzle. We also derive a way of testing if the heuristic 
value returned by additive abstractions is provably too low and show that the use of this 
test can reduce search time for the 15-puzzle and TopSpin by roughly a factor of two. 

1. Introduction 

In its purest form, single-agent heuristic search is concerned with the problem of finding a 
least-cost path between two states {start and goal) in a state space given a heuristic function 
h{t, g) that estimates the cost to reach the goal state g from any state t. Standard algorithms 
for single-agent heuristic search such as IDA* (Korf, 1985) are guaranteed to find optimal 
paths if h{t, g) is admissible, i.e. never overestimates the actual cost to the goal state from 
t, and their efficiency is heavily influenced by the accuracy of h(t,g). Considerable research 
has therefore investigated methods for defining accurate, admissible heuristics. 

A common method for defining admissible heuristics, which has led to major advances 
in combinatorial problems (Culberson &: Schaeffer, 1998; Hcrnadvolgyi, 2003; Korf, 1997; 
Korf &: Taylor, 1996) and planning (Edelkamp, 2001), is to "abstract" the original state 
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space to create a new, smaller state space with the key property that for each path p in the 
original space there is a corresponding abstract path whose cost does not exceed the cost 
of p. Given an abstraction, h{t._ g) can be defined as the cost of the least-cost abstract path 
from the abstract state corresponding to t to the abstract state corresponding to g. The 
best heuristic functions defined by abstraction are typically based on several abstractions, 
and are equal to either the maximum, or the sum, of the costs returned by the abstractions 
(Korf k Felner, 2002; Felner, Korf, & Hanan, 2004; Holte, Felner, Newton, Meshulam, & 
Furcy, 2006). 

The sum of the costs returned by a set of abstractions is not always admissible. If it 

is, the set of abstractions is said to be "additive". The main contribution of this paper is 
to identify general conditions for abstractions to be additive. The new conditions subsume 
most previous notions of "additive" as special cases. The greater generality allows additive 
abstractions to be defined for state spaces that had no additive abstractions according to 
previous definitions, such as Rubik's Cube, TopSpin, the Pancake puzzle, and related real- 
world problems such as the genome rearrangement problem described by Erdem and Tillier 
(2005). Our definitions are fully formal, enabling rigorous proofs of the admissibility and 
consistency of the heuristics defined by our abstractions. Heuristic g) is consistent if for 
all states t, g and u, h{t, g) < cost{t, u) + h{u, g), where cost{t, u) is the cost of the least-cost 
path from t to u. 

The usefulness of our general definitions is demonstrated experimentally by defining 
additive abstractions that substantially reduce the CPU time needed to solve TopSpin and 
the Pancake puzzle. For example, the use of additive abstractions allows the 17-Pancake 
puzzle to be solved three orders of magnitude faster than previous state-of-the-art methods. 

Additional experiments show that additive abstractions are not always the best abstrac- 
tion method. The main reason for this is that the solution cost calculated by an individual 
additive abstraction can sometimes be very low. In the extreme case, which actually arises 
in practice, all problems can have abstract solutions that cost 0. The final contribution of 
the paper is to introduce a technique that is sometimes able to identify that the sum of the 
costs of the additive abstractions is provably too small ( "infeasible" ) . 

The remainder of the paper is organized as follows. An informal introduction to ab- 
straction is given in Section 2. Section 3 presents formal general definitions for abstractions 
that extend to general additive abstractions. We provide lemmas proving the admissibility 
and consistency of both standard and additive heuristics based on these abstractions. This 
section also discusses the relation to previous definitions. Section 4 describes successful ap- 
plications of additive abstractions to TopSpin and the Pancake puzzle. Section 5 discusses 
the negative results. Section 6 introduces "infcasibility" and presents experimental results 
showing its effectiveness on the sliding tile puzzle and TopSpin. Conclusions are presented 
in Section 7. 

2. Heuristics Defined by Abstraction 

To illustrate the idea of abstraction and how it is used to define heuristics, consider the 
well-known 8-puzzle (the 3x3 sliding tile puzzle). In this puzzle there are 9 locations in 
the form of a 3 x 3 grid and 8 tiles, numbered 1-8, with the 9*^* location being empty (or 
blank). A tile that is adjacent to the empty location can be moved into the empty location; 
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every move has a cost of 1 . The most common way of abstracting this state space is to treat 
several of the tiles as if they were indistinguishable instead of being distinct (Culberson & 
Schaeffer, 1996). An extreme version of this type of abstraction is shown in Figure 1. Here 
the tiles are all indistinguishable from each other, so an abstract state is entirely defined by 
the position of the blank. There are therefore only 9 abstract states, connected as shown 
in Figure 1. The goal state in the original puzzle has the blank in the upper left corner, 
so the abstract goal is the state shown at the top of the figure. The number beside each 
abstract state is the distance from the abstract state to the abstract goal. For example, in 
Figure 1, abstract state e is 2 moves from the abstract goal. A heuristic function h{t,g) for 
the distance from state t to ^ in the original space is computed in two steps: (1) compute 
the abstract state corresponding to t (in this example, this is done by determining the 
location of the blank in state t); and then (2) determine the distance from that abstract 
state to the abstract goal. The calculation of the abstract distance can either be done in a 
preprocessing step to create a heuristic lookup table called a pattern database (Culberson 
& Schaeffer, 1994, 1996) or at the time it is needed (Holte, Perez, Zimmer, &: MacDonald, 
1996; Holte, Grajkowski, & Tanner, 2005; Felner k Adler, 2005). 




Figure 1: An abstraction of the 8-puzzle. The white square in each state is the blank and 
the non- white squares are the tiles, which are all indistinguishable from each other 
in this abstraction. 



Given several abstractions of a state space, the heuristic hmaxit^g) can be defined as the 
maximum of the abstract distances for t given by the abstractions individually. This is the 
standard method for defining a heuristic function given multiple abstractions (Holte et al., 
2006). For example, consider state A of the 3x3 sliding tile puzzle shown in the top left 
of Figure 2 and the goal state shown below it. The middle column shows an abstraction 
of these two states (^i and gi) in which tiles 1, 3, 5, and 7, and the blank, are distinct 
while the other tiles are indistinguishable from each other. We refer to the distinct tiles as 
"distinguished tiles" and the indistinguishable tiles as "don't care" tiles. The right column 
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shows the complementary abstraction, in which tiles 1, 3, 5, and 7 are the "don't cares" 
and tiles 2, 4, 6, and 8 are distinguished. The arrows in the figure trace out a least-cost 
path to reach the abstract goal gt from state Ai in each abstraction. The cost of solving Ai 
is 16 and the cost of solving A2 is 12. Therefore, hmaxiA^d) is 16, the maximum of these 
two abstract distances. 
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Figure 2: Computation of hmax{A, g), the standard, maximum-based heuristic value for 
state A (top left) using the two abstractions shown in the middle and right 
columns. Solid arrows denote distinguished moves, dashed arrows denote "don't 
care" moves. 



2.1 Additive Abstractions 

Figure 3 illustrates how additive abstractions can be defined for the sliding tile puzzle (Korf 
& Felner, 2002; Felner et al., 2004; Korf & Taylor, 1996). State A and the abstractions are 
the same as in Figure 2, but the costs of the operators in the abstract spaces are defined 
differently. Instead of all abstract operators having a cost of 1, as was the case previously, 
an operator only has a cost of 1 if it moves a distinguished tile; such moves are called 
"distinguished moves" and are shown as solid arrows in Figures 2 and 3. An operator that 
moves a "don't care" tile (a "don't care" move) has a cost of and is shown as a dashed 
arrow in the figures. Least-cost paths in abstract spaces defined this way therefore minimize 
the number of distinguished moves without considering how many "don't care" moves are 
made. For example, the least-cost path for Ai in Figure 3 contains fewer distinguished 
moves (9 compared to 10) than the least-cost path for Ai in Figure 2 — and is therefore 
lower cost according to the cost function just described — but contains more moves in total 
(18 compared to 16) because it has more "don't care" moves (9 compared to 6). As Figure 
3 shows, 9 distinguished moves are needed to solve Ai and 5 distinguished moves are needed 
to solve A2. Because no tile is distinguished in both abstractions, a move that has a cost of 
1 in one space has a cost of in the other space, and it is therefore admissible to add the 
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two distances. The heuristic calculated using additive abstractions is referred to as hadd', in 
this example, hadd{-^-,g) = 9 + 5 = 14. Note that hadd{^,g) is less than hmaxiA,g) in this 
example, showing that heuristics based on additive abstractions are not always superior 
to the standard, maximum-based method of combining multiple abstractions even though 
in general they have proven very effective on the sliding tile puzzles (Korf &: Felner, 2002; 
Felner et al., 2004; Korf & Taylor, 1996). 
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Figure 3: Computation of hadd{A,g), the additive heuristic value for state A. Solid arrows 
denote distinguished moves, dashed arrows denote "don't care" moves. 



The general method defined by Korf, Felner, and colleagues (Korf & Felner, 2002; Felner 
et al., 2004; Korf &: Taylor, 1996) creates a set of k additive abstractions by partitioning 
the tiles into k disjoint groups and defining one abstraction for each group by making the 
tiles in that group distinguished in the abstraction. An important limitation of this and 
most other existing methods of defining additive abstractions is that they do not apply to 
spaces in which an operator can move more than one tile at a time, unless there is a way 
to guarantee that all the tiles that are moved by the operator are in the same group. 

An example of a state space that has no additive abstractions according to previous 
definitions is the Pancake puzzle. In the A^-Pancake puzzle, a state is a permutation of 
tiles (0, 1, — 1) and has A^ — 1 successors, with the successor formed by reversing 
the order of the first / + 1 positions of the permutation (1 < I < N —1). For example, in the 
4-Pancake puzzle shown in Figure 4, the state at the top of the figure has three successors, 
which are formed by reversing the order of the first two tiles, the first three tiles, and all four 
tiles, respectively. Because the operators move more than one tile and any tile can appear 
in any location there is no non-trivial way to partition the tiles so that all the tiles moved 
by an operator are distinguished in just one abstraction. Other common state spaces that 
have no additive abstractions according to previous definitions — for similar reasons — are 
Rubik's Cube and TopSpin. 
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Figure 4: In the 4-Pancake puzzle each state has three successors. 



The general definition of additive abstractions presented in the next section overcomes 
the limitations of previous definitions. Intuitively, abstractions will be additive provided 
that the cost of each operator is divided among the abstract spaces. Our definition provides 
a formal basis for this intuition. There are numerous ways to do this even when operators 
move many tiles (or, in other words, make changes to many state variables). For example, 
the operator cost might be divided proportionally across the abstractions based on the 
percentage of the tiles moved by the operator that are distinguished in each abstraction. 
We call this method of defining abstract costs "cost-splitting". For example, consider two 
abstractions of the 4-Pancake puzzle, one in which tiles and 1 are distinguished, the other 
in which tiles 2 and 3 are distinguished. Then the middle operator in Figure 4 would have 
a cost of I in the first abstract space and ^ in the second abstract space, because of the 
three tiles this operator moves, two are distinguished in the first abstraction and one is 
distinguished in the second abstraction. 

A different method for dividing operator costs among abstractions focuses on a spe- 
cific location (or locations) in the puzzle and assigns the full cost of the operator to the 
abstraction in which the tile that moves into this location is distinguished. We call this 
a "location-based" cost definition. In the Pancake puzzle it is natural to use the leftmost 
location as the special location since every operator changes the tile in this location. The 
middle operator in Figure 4 would have a cost of in the abstract space in which tiles 
and 1 are distinguished and a cost of 1 in the abstract space in which tiles 2 and 3 are 
distinguished because the operator moves tile 2 into the leftmost location. 

Both these methods apply to Rubik's Cube and TopSpin, and many other state spaces in 
addition to the Pancake puzzle, but the hadd heuristics they produce are not always superior 
to the hmax heuristics based on the same tile partitions. The theory and experiments in the 
remainder of the paper shed some light on the general question of when hadd is preferable 

to hyyinx- 



3. Formal Theory of Additive Abstractions 

In this section, we give formal definitions and lemmas related to state spaces, abstractions, 
and the heuristics defined by them, and discuss their meanings and relation to previous 
work. The definitions of state space etc. in Section 3.1 are standard, and the definition of 
state space abstraction in Section 3.2 differs from previous definitions only in one important 
detail: each state transition in an abstract space has two costs associated with it instead 
of just one. The main new contribution is the definition of additive abstractions in Section 
3.3. 
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The underlying structure of our abstraction definition is a directed graph (digraph) 
homomorphism. For easy reference, we quote here standard definitions of digraph and 
digraph homomorphism (Hell &: Nesetril, 2004). 

Definition 3.1 A digraph G is a finite set V = V{G) of vertices, together with a binary 
relation E = E{G) on V. The elements (u, v) of E are called the arcs of G. 

Definition 3.2 Let G and H be any digraphs. A homomorphism of G to H , written as 
f : G ^ H is a mapping f: V{G) — > V{H) such that {f(u),f{v)) G E{H) whenever 
{u,v) e E{G). 

Note that the digraphs G and H may have self-loops, (tt,u), and a homomorphism is 
not required to be surjective in either vertices or arcs. Wc typically refer to arcs as edges, 
but it should be kept in mind that, in general, they are directed edges, or ordered pairs. 

3.1 State Space 

Definition 3.3 A state space is a weighted directed graph S = (T, 11, G) where T is a finite 
set of states, IlC.TxTisa set of directed edges (ordered pairs of states) representing state 
transitions, and C : 11 — > JV = {0, 1,2,3,...} is the edge cost function. 

In typical practice, S is defined implicitly. Usually each distinct state in T corresponds to 
an assignment of values to a set of state variables. IT and G derive from a successor function, 
or a set of planing operators. In some cases, T is restricted to the set of states reachable 
from a given state. For example, in the 8-puzzle, the set of edges 11 is defined by the rule "a 
tile that is adjacent to the empty location can be moved into the empty location" , and the 
set of states T is defined in one of two ways: either as the set of states reachable from the 
goal state, or as the set of permutations of the tiles and the blank, in which case T consists 
of two components that are not connected to one another. The standard cost function G 
for the 8-puzzle assigns a cost of 1 to all edges, but it is easy to imagine cost functions for 
the 8-puzzle that depend on the tile being moved or the locations involved in the move. 

A path from state t to state g is a sequence of edges beginning at t and ending at 
g. Formally, p is a path from state t to state g ii p = (tt^, . . . ,7r"),7r^ G 11 where tt-^ = 
{P~^,P),j G {l,...,n} and t° = t, = g. Note the use of superscripts rather than 
subscripts to distinguish states and edges within a state space. The length of p is the 
number of edges n and its cost is G{p) = Yl]=i ^i'^'')- We use Paths{S,t,g) to denote the 
set of all paths from t to g in S. 

Definition 3.4 The optimal (minimum) cost of a path from state t to state g inS is defined 
by 

OPT(t, g) = min Gip) 

p£Paths{S,t,g) 

A pathfinding problem is a triple {S,t,g), where 5 is a state space and t,g ^ T, with 
the objective of finding the minimum cost of a path from t to g, or in some cases finding a 
minimum cost path p e Paths{S,t,g) such that G{p) = OPT{t,g). Having just one goal 
state may seem restrictive, but problems having a set of goal states can be accommodated 
with this definition by adding a virtual goal state to the state space with zero-cost edges 
from the actual goal states to the virtual goal state. 
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3.2 State Space Abstraction 

Definition 3.5 An Abstraction System is a pair {S, K) where S = {T, U, C) is a state space 

and H = {{Ai^tpi) \ Tpi : S ^ Ai, 1 < i < k} is a set o/ abstractions, where each abstraction 
is a pair consisting of an abstract state space and an abstraction mapping, where "abstract 
state space" and "abstraction mapping" are defined below. 

Note that these abstractions are not intended to form a hierarchy and should be con- 
sidered a set of independent abstractions. 

Definition 3.6 An abstract state space is a directed graph with two weights per edge, 
defined by a four-tuple Ai = {Ti, Ilj, Cj, Ri). 

Ti is the set of abstract states and 11^ is the set of abstract edges, as in the definition of 
a state space. In an abstract space there are two costs associated with each VTj £ Hi, the 
primary cost Ci : Xlj — > N and the residual cost Ri : Ifj — > M . The idea of having 
two costs per abstract edge, instead of just one, is inspired by the practice, illustrated in 
Figure 3, of having two types of edges in the abstract space and counting distinguished 
moves differently than "don't care" moves. In that example, our primary cost is the cost 
associated with the distinguished moves, and our residual cost is the cost associated with 
the "don't care" moves. The usefulness of considering the cost of "don't care" moves arises 
when the abstraction system is additive, as suggested by Lemmas 3.6 and 3.10 below. These 
indicate when the additive heuristic is infeasible and can be improved, the effectiveness of 
which will become apparent in the experiments reported in Section 6. 

Like edges, each abstract path pi = {irj, . . . , tt") in Ai has a primary and residual cost: 
CiiPi) = E"=i CiiTri), and Ri{pi) = E^=i RiiTrj). 

Definition 3.7 An abstraction mapping ^i : S — > Ai between state space S and abstract 
state space Ai is defined by a mapping between the states of S and the states of Ai, ■i/'i '■ 
T Ti, that satisfies the two following conditions. 

The first condition is that the mapping is a homomorphism and thus connectivity in the 
original space is preserved, i.e., 

(1) ^u,v) eU,{il^i{u),A{v)) eu. 

In other words, for each edge in the original space S there is a corresponding edge in the 
abstract space Ai. Note that ii u ^ v and tpi{u) = ipi{v) then a non-identity edge in S gets 
mapped to an identity edge (self-loop) in Ai- We use the shorthand notation t-- = tpi{P) 
for the abstract state in Tj corresponding to P G T, and irj = ipi{'iT^) = {'^i{u^),il)i{v^)) for 
the abstract edge in Ilj corresponding to tt^ = {u^,v^) G 11. 

The second condition that the state mapping must satisfy is that abstract edges must 
not cost more than any of the edges they correspond to in the original state space, i.e., 

(2) Vvr G n, CiiTTi) + Riim) < C{7r) 
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As a consequence, if multiple edges in the original space map to the same abstract edge 
p G Ilj, as is usually the case, Ci(p) + Ri{p) must be less than or equal to all of them, i.e., 

Vp G n„ C,(p) + R^{p) < min C{ti) 

Note that if no edge maps to an edge in the abstract space, then no bound on the cost of 
that edge is imposed. 

For example, the state mapping used to define the abstraction in the middle column 

of Figure 3 maps an 8-puzzle state to an abstract state by renaming tiles 2, 4, 6, and 8 
to "don't care". This mapping satisfies condition (1) because "don't care" tiles can be 
exchanged with the blank whenever regular tiles can. It satisfies condition (2) because each 
move is either a distinguished move {Ci{iTi) = 1 and Ri{iTi) = 0) or a "don't care" move 
(Ci(7ri) = and Ri{'Ki) = 1) and in both cases Ci{iTi) + Ri{'Ki) = 1, the cost of the edge tt 
in the original space. 

The set of abstract states Tj is usually equal to ipi{T) = {'tpi{t) \ t & T}, but it can be a 
superset, in which case the abstraction is said to be non-surjective (Hernadvolgyi & Holte, 
2000). Likewise, the set of abstract edges ITj is usually equal to = {i^iiir) | tt G 11} 

but it can be a superset even if Tj = ipi{T). In some cases, one deliberately chooses an 
abstract space that has states or edges that have no counterpart in the original space. For 
example, the methods that define abstractions by dropping operator preconditions must, by 
their very design, create abstract spaces that have edges that do not correspond to any edge 
in the original space {e.g. Pearl, 1984). In other cases, non-surjectivity is an inadvertent 
consequence of the abstract space being defined implicitly as the set of states reachable 
from the abstract goal state by applying operator inverses. For example, if a tile in the 
2x2 sliding tile puzzle is mapped to the blank in the abstract space, the puzzle now has 
two blanks and states are reachable in the abstract space that have no counterpart in the 
original space (Hernadvolgyi & Holte, 2000). For additional examples and an extensive 
discussion of non-surjectivity see the previous paper by Holte and Hernadvolgyi (2004). 

All the lemmas and definitions that follow assume an abstraction system (5, K) contain- 
ing k abstractions has been given. Conditions (1) and (2) guarantee the following. 

Lemma 3.1 For any path p G Paths{S, u^, v?) in S, there is a corresponding abstract path 

i^i{p) from uj to uj in Ai and Ci{'il)i{p)) + Ri{ipi{p)) <C{p). 

Proof: By definition, p £ Paths{S, u^, n^) in 5 is a sequence of edges (vr^, . . . , vr"^), vr-' G H 
where tt^ = G {1, . . . ,n} and t^ = u^,t"' = u^. Because Hj ^ ijjiijl), each of the 

corresponding abstract edges exists (tt^ G Hj). Because ti} = {u\,t]) and vrf = (t"^"'^,u|), 
the sequence il^i{p) = {it}, . . . , vr^) is a path from u\ to uf. 

By definition, C{p) = Yl^=i^i'^'')- For each vr-', Condition (2) ensures that C{tt^) > 
ainD+R^ini), and therefore C(p) > E^i(a(7rD+i?.(7r^)) = E^i a(7ri)+E^i i?,(7r^) = 

CMp)) + MMp))- □ 

For example, consider state A and goal g in Figure 3. Because of condition (1), any path 
from state A to 5 in the original space is also a path from abstract state Ai to abstract goal 
state gi and from abstract state A2 to 52 in the abstract spaces. Because of condition (2), 
the cost of the path in the original space is greater than or equal to the sum of the primary 
cost and the residual cost of the corresponding abstract path in each abstract space. 
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We use Paths{Ai., u, v) to mean the set of all paths from u to in space Ai- 

Definition 3.8 The optimal abstract cost from abstract state u to abstract state v in Ai 
is defined as 

OPTiiu, v) = ^ min Ci{q) + Ri{q) 
q^Paths{Ai ,u,v) 

Definition 3.9 We define the heuristic obtained from abstract space Ai for the cost from 
state t to g as 

hi{t,g) = OPTi{ti,gi). 

Note that in these definitions, the path minimizing the cost is not required to be the image, 
ipi{p), of a path p in S. 

The following prove that the heuristic generated by each individual abstraction is ad- 
missible (Lemma 3.2) and consistent (Lemma 3.3). 

Lemma 3.2 hi{t, g) < OPT(i, g) for all t,geT and all i e {1, . . . ,k}. 

Proof: By Lemma 3.1, C{p) > Ci{ipi{p)) + Ri{il)i{p)), and therefore 

min C{p)>^ min Ci{tpi{p)) + Riii/jiip)). 

pEPaths{S,t,g) pEPaths{S,t,g) 

The left hand side of this inequality is OPT{t,g) by definition, and the right hand side 
is proved in the following Claim 3.2.1 to be greater than or equal to hi{t,g). Therefore, 

OPT{t,g) > hi{t,g). 

Claim 3.2.1 mm^^pa^f^s(^s,t,g) Ciii^iif)) + Ri{A{p)) > hi{t,g) for aU t,g eT. 

Proof of Claim 3.2.1: By Lemma 3.1 for every path p there is a correspond- 
ing abstract path. There may also be additional paths in the abstract space, that is, 
{"^iip) I P £ Paths{S,t, g)} C Paths{Ai,ti, gi). It follows that {Ci(?/;i(p)) + Ri{ipi{p)) \ p G 
Paths{S,t, g)} C {Ci{q} + Ri{q) \ q G Paths{Ai,ti, gi)}. Therefore, 

min Cii^iip)) + RiiMP)) > . min Ci{q) + Ri{q) = OPTi{ti, gi) = hi{t, g) 

p€Paths(S,t,g) q&Paths{Ai,ti,gi) 

□ 

Lemma 3.3 hi{t^,g) < OPT {t^, t"^) + hi{t'^,g) for allt^,t^,g G T and all i e {1,...,A;}. 

Proof: By the definition of OPTj as a minimization and the definition of hi{t,g), it follows 
that h,{t\g) = OPTi{tj,gi) < OPT,{t},tJ) + OPTi{tlg,) = OPT,{tj,t^) + hi{t^,g). 

To complete the proof, we observe that by Lemma 3.2, OPT(i^,t^) > hi{t^,t^) = 
OPTi{t],ti). □ 

Definition 3.10 The hmax heuristic from state t to state g defined by an abstraction system 
is 

k 

hmax{t,g) = max hi{t,g) 
1=1 

From Lemmas 3.2 and 3.3 it immediately follows that hmax is admissible and consistent. 
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3.3 Additive Abstractions 

In this section, we formalize the notion of "additive abstraction" that was introduced intu- 
itively in Section 2.1. The example there showed that hadd{t,g), the sum of the heuristics 
for state t defined by multiple abstractions, was admissible provided the cost functions in 
the abstract spaces only counted the "distinguished moves" . In our formal framework, the 
"cost of distinguished moves" is captured by the notion of primary cost. 

Definition 3.11 For any pair of states t,g the additive heuristic given an abstraction 
system is defined to be 

k 

hadd{t,g) = ^C*{ti,gi). 

i=l 

where 

Ci{U,9i) = ^ „ pin Ci{q) 
qePaths{Ai,ti,gi) 

is the minimum primary cost of a path in the abstract space from ti to gi . 

In Figure 3, for example, Cl{Ai^gi) = 9 and C|(^2)5'2) = 5 because the minimum 
number of distinguished moves to reach gi from Ai is 9 and the minimum number of 
distinguished moves to reach g2 from A2 is 5. 

Intuitively, hadd will be admissible if the cost of edge tt in the original space is divided 
among the abstract edges tti that correspond to tt, as is done by the "cost-splitting" and 
"location-based" methods for defining abstract costs that were introduced at the end of 
Section 2.1. This leads to the following formal definition. 

Definition 3.12 An abstraction system (5, K) is additive if^/ir G 11,^*1^^ Ci{iTi) < C{Tr). 

The following prove that hadd is admissible (Lemma 3.4) and consistent (Lemma 3.5) 
when the abstraction system {S, K) is additive. 

Lemma 3.4 // is additive then hadd{t^9) < OPT(i, 5) for all t,g ^T. 

Proof: Assume that OFT{t,g) = C{p), where p = (tt-^, . . . ,7r") G Paths{S,t,g). Therefore, 
OPT{t,g) = YJj=i Since {S, K) is additive, it follows by definition that 

n n k k n 

j=i j=i 1=1 1=1 j=i 

k 

^ XI ^»*(*'' f') = hadd{t, g) 

i=l 

where the last line follows from the definitions of C* and hadd- □ 

Lemma 3.5 If{S,ii.) is additive then haddit^ , g) < OPT^it^,t^)+Kddit'^,g) for allt^,t'^,g e 
T. 
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Proof: C*{t},gi) obeys the triangle inequality: C*{t\,gi) < C*{tj,tf) + C*{tf,gi) for all 
t\t\g e T. It follows that Eti C*itj,gi) < Eti C*itj,tf) + Eti C*{tlg.). 

Because EtiCr(4'ffO = hadd{t\g) and Y!l=iC*{ti,gi) = haddif^ , g) , it follows that 

Since is additive, by Lemma 3.4, OFT{t\t'^) > ^^t^ C*{tj,tf). 

Hence hadd{t\g) < OFT{t\t'^) + hadd{t^,g) for all t\ t^, ^ g r. □ 

We now develop a simple test that has important consequences for additive heuristics. 
Define Pi{ti,gi) = {q \ q ^ Paths{Ai,ti, gi) and Ci{q) = Cl{ti,gi)}, the set of abstract 
paths from ti to gi whose primary cost is minimal. 

Definition 3.13 The conditional optimal residual cost is the minimum residual cost among 
the paths in Pi{ti,gi): 

Ri{ti,9i)= min Ri{q) 

Note that the value of {C*{ti,gi) + R*{ti,gi)) is sometimes, but not always, equal to the 
optimal abstract cost OPTj(tj, (7^). In Figure 3, for example, OPTi{Ai, gi) = 16 (a path 
with this cost is shown in Figure 2) and Cl{Ai,gi) + Rl{Ai,gi) = 18, while 01(^2,52) + 
i?2(^2,5'2) = OPT2(^2j52) = 12. As the following lemmas show, it is possible to draw 
important conclusions about hadd by comparing its value to {C*{ti,gi) + R*{ti, gi)). 

Lemma 3.6 Let be any additive abstraction system and let t,g be any states. If 

hadd{t,g) > C*{tj,gj) + R*{tj,gj) for all j G {l,...,k}, then hadd{t,g) > hmax{t,g)- 

Proof: By the definition of OPTi(ti,(/i),Vj G {I, . . . ,k},C*{tj,gj)+R*{tj,gj) > OFTj{tj,gj). 
Therefore, Vj G {1, . . . , k}, hadd{t,g) > C*{tj,gj) + R*{tj,gj) > OPTj{tj,gj) hadd{t,g) > 
maxi<j<fe OPTi(ij, g^) = hmax{t, g)- □ 

Lemma 3.7 For an additive {S, i^) and path p G Paths{S, t,g) withC{p) = Yli=i ^tiUigi), 
Cji'il^jip)) = C*{tj,gj) for all j G {1, . . . , k}. 

Proof: Suppose for a contradiction that there exists some ii, such that Ci^{ijji^{p)) > 
C*_^{ti^, gij^). Then because C{p) = Yli=i^i(.'^i^9i)j there must exist some i2, such that 
Ci2{ipi2ip)) < C*^{ti,^, gi.^), which contradicts the definition of C*. Therefore, such an ii 
does not exist and Cj{il)j{p)) = Cj{tj,gj) for all j G {1, . . . , k}. □ 

Lemma 3.8 For an additive {S, 'R) and a pathp G Paths{S, t, g) withC{p) = Yli=i ^tiUi gi), 

RiiMp)) > R*i{ti,gi) for all ie{i,...,k}. 

Proof: Following Lemma 3.7 and the definition of Pi(ti,gi), ipi{p) G Pi{ti,gi) for all i G 
{1, . . . , A;}. Because R*(ti,gi) is the smallest residual cost of paths in Pi{ti,gi), it follows 
that Ri{^i{p))>R*{ti,gi). □ 

Lemma 3.9 For an additive {S andapathp£ Paths{S,t, g) withC{p) = '^i=iC^{ti, gi), 
Ei=i C*{ti,gi) > C*{t„gj) + R*{tj,gj) for all j G {1, . . . , k}. 
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Proof: By Lemma 3.1, C{p) > Cj{'il)j{p)) + Rj{'il)j{'p)) for all j G {1, . . . , k}. By Lemma 3.7 
CMjiv)) = Cj{tj,9j), and by Lemma 3.8 Rj{ipj{p)) > R*{tj,gj). Therefore C{p) > 
C*{tj,gj)+Rj(tj,gj), and the lemma follows from the premise that C{p) = C^{ti,gi). 

□ 

Lemma 3.10 Let (5,^^) be any additive abstraction system and let t,g eT be any states. 
If hadd{t, g) < C*{tj,gj) + R*{tj,gj) for some j e {1,..., k}, then hadd{t, g) ^ OPT{t, g). 

Proof: This lemma follows directly as the contrapositive of Lemma 3.9. □ 

Lemma 3.6 gives a condition under which hadd is guaranteed to be at least as large as 
hmax for a specific states t and g. If this condition holds for a large fraction of the state 
space T, one would expect that search using hadd to be at least as fast as, and possibly 
faster than, search using hmax- This will be seen in the experiments reported in Section 4. 
The opposite is not true in general, i.e., failing this condition does not imply that hmax will 
result in faster search than hadd- However, as Lemma 3.10 shows, there is an interesting 
consequence when this condition fails for state t: we know that the value returned by hadd 
for t is not the true cost to reach the goal from t. Detecting this is useful because it allows 
the heuristic value to be increased without risking it becoming inadmissible. Section 6 
explores this in detail. 

3.4 Relation to Previous Work 

The aim of the preceding formal definitions is to identify fundamental properties that guar- 
antee that abstractions will give rise to admissible, consistent heuristics. We have shown 
that the following two conditions guarantee that the heuristic defined by an abstraction is 
admissible and consistent 

(PI) V(tx,t;) Gn,(V'^(n),^,(?;)) GHi 

(P2) Vtt G n, C(7r) > di-Ki) + Riim) 

and that a third condition 

k 

(P3) V7rGn,C(7r) > ^a(7r0 

i=l 

guarantees that hadd{t,g) is admissible and consistent. 

Previous work has focused on defining abstraction and additivity for specific ways of 
representing states and transition functions. These are important contributions because 
ultimately one needs computationally effective ways of defining the abstract state spaces, 
abstraction mappings, and cost functions that our theory takes as given. The importance 
of our contribution is that it should make future proofs of admissibility, consistency, and 
additivity easier, because one will only need to show that a particular method for defining 
abstractions satisfies the three preceding conditions. These are generally very simple con- 
ditions to demonstrate, as we will now do for several methods for defining abstractions and 
additivity that currently exist in the literature. 
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3.4.1 Previous Definitions of Abstraction 

The use of abstraction to create heuristics began in the late 1970s and was popularized in 
Pearl's landmark book on heuristics (Pearl, 1984). Two abstraction methods were identi- 
fied at that time: "relaxing" a state space definition by dropping operator preconditions 
(Gaschnig, 1979; Guida & Somalvico, 1979; Pearl, 1984; Valtorta, 1984), and "homomor- 
phic" abstractions (Banerji, 1980; Kibler, 1982). These early notions of abstraction were 
unified and extended by Mostow and Prieditis (1989) and Prieditis (1993), producing a 
formal definition that is the same as ours in all important respects except for the concept 
of "residual cost" that we have introduced.^ 

Today's two most commonly used abstraction methods are among the ones implemented 
in Prieditis's Absolver II system (Prieditis, 1993). The first is "domain abstraction", which 
was independently introduced in the seminal work on pattern databases (Culberson &: 
Schaeffer, 1994, 1998) and then generalized (Hernadvolgyi &: Holte, 2000). It assumes a 
state is represented by a set of state variables, each of which has a set of possible values 
called its domain. An abstraction on states is defined by specifying a mapping from the 
original domains to new, smaller domains. For example, an 8-puzzle state is typically 
represented by 9 variables, one for each location in the puzzle, each with the same domain 
of 9 elements, one for each tile and one more for the blank. A domain abstraction that 
maps all the elements representing the tiles to the same new element ("don't care") and 
the blank to a different clement would produce the abstract space shown in Figure 1. The 
reason this particular example satisfies property (PI) is explained in Section 3.2. In general, 
a domain abstraction will satisfy property (PI) as long as the conditions that define when 
state transitions occur {e.g. operator preconditions) are guaranteed to be satisfied by the 
"don't care" symbol whenever they are satisfied by one or more of the domain elements 
that map to "don't care". Property (P2) follows immediately from the fact that all state 
transitions in the original and abstract spaces have a primary cost of 1. 

The other major type of abstraction used today, called "drop" by Prieditis (1993), was 
independently introduced for abstracting planning domains represented by grounded (or 
propositional) STRIPS operators (Edelkamp, 2001). In a STRIPS representation, a state 
is represented by the set of logical atoms that are true in that state, and the directed edges 
between states are represented by a set of operators, where each operator a is described by 
three sets of atoms, P{a), A{a), and D{a). P{a) lists a's preconditions: a can be applied to 
state t only if all the atoms in P{a) are true in t (i.e., P{a) C t). A{a) and D{a) specify the 
effects of operator a, with A(a) listing the atoms that become true when a is applied (the 
"add" list) and D{a) listing the atoms that become false when a is applied (the "delete" 
list). Hence if operator a is applicable to state t, the state u = a{t) it produces when applied 
to t is the set of atoms u = (t — D{a)) U A{a). 

In this setting, Edelkamp defined an abstraction of a given state space by specifying a 
subset of the atoms and restricting the abstract state descriptions and operator definitions 
to include only atoms in the subset. Suppose is the subset of the atoms underlying 
abstraction mapping tpi : S — > Ai, where S is the original state space and Ai is the abstract 
state space based on V^. Two states in S will be mapped to the same abstract state if and 

1. Prieditis's definition allows an abstraction to expand the set of goals. This can be achieved in our 
definition by mapping non-goal states in the original space to the same abstract state as the goal. 
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only if they contain the same subset of atoms in Vi, i.e., '(pi{t) = ipi{u) iS tCiVi = uCiVi. 
This satisfies property (PI) because operator a being appHcable to state t {P{a) C t) 
implies abstract operator = ipi{a) is applicable to abstract state ti (-P(a) CiVi C. tCiVi) 
and the resulting state a{t) = {t — D{a)) U A{a) is mapped by ipi to ai{ipi{t)) because 
set intersection distributes across set subtraction and union (Vi n {{t — D{a)) U A{a)) = 
{{Vi nt) - {Vi n D{a))) U {Vi n A{a))). Again, property (P2) follows immediately from the 
fact that all operators in the original and abstract spaces have a primary cost of 1. 

Recently, Helmert et al. (2007) described a more general approach to defining abstrac- 
tions for planning based on "transition graph abstractions" . A transition graph is a directed 
graph in which the arcs have labels, and a transition graph abstraction is a directed graph 
homomorphism that preserves the labels.^ Hence, Helmert et al.'s method is a restricted 
version of our definition of abstraction and therefore satisfies properties (PI) and (P2). 
Helmert et al. make the following interesting observations that are true of our more general 
definition of abstractions: 

• the composition of two abstractions is an abstraction. In other words, if ?/; : <S — > A 

is an abstraction of S and (j) : A — > B is an abstraction of A, then {(f>o^) : S — > B is 
an abstraction of S. This property of abstractions was exploited by Prieditis (1993). 

• the "product" Ai X A2 of two abstractions, Ai and A2, of S is an abstraction of S, 
where the state space of the product is the Cartesian product of the two abstract 
state spaces, and there is an edge 7rix2 in the product space from state (^1,^2) to state 
{uijUi) if there is an edge tti from ti to ui in Ai and there is an edge 7r2 from t2 to 
U2 in A2. The primary cost of 7rix2 is the minimum of Ci(7ri) and C2(7r2) and the 
residual cost of 7rix2 is taken from the same space as the primary cost. Because they 
are working with labelled edges Helmert et al. require the edge connecting ti to ui 
to have the same label as the edge connecting t2 to U2; this is called a "synchronized" 
product and is denoted ^1 (8)^2 (refer to Definition 6 defined by Helmert et al. (2007) 
for the exact definition of synchronized product). 

Figure 5 shows the synchronized product, B, of two abstractions, Ai and A2, of the 
3-state space S in which the edge labels are a and b. Ai is derived from S by mapping 
states si and S2 to the same state (51^2); and A2 is derived from S by mapping states S2 
and S3 to the same state {32,3). Note that B contains four states, more than the original 
space. It is an abstraction of S because the mapping of original states si, S2, and S3 to 
states (si,2,si) {si^2,S2,3) and {s3,S2,?,), respectively, satisfies property (PI), and property 
(P2) is satisfied automatically because all edges have a cost of 1. From this point of view 
the fourth state in B, (53, si), is redundant with state (si,2, si). Nevertheless it is a distinct 
state in the product space. 

Haslum et al. (2005) introduce a family of heuristics, called h"^ (for any fixed m G 
{1,2, ...}), that are based on abstraction, but are not covered by our definition because the 
value of the heuristic for state t, h"^{t), is not defined as the distance from the abstraction 
of t to the abstract goal state. Instead it takes advantage of a special monotonicity property 

2. "Homomorphism" here means the standard definition of a digraph homomorphism (Definition 3.2), 

which permits non-surjectivity (as discussed in Section 3.2), as opposed to Helmert et al.'s definition of 
"homomorphism" , which does not allow non-surjectivity. 
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B=Ai®A2 



$2 ~ S3 




(*^3' '^2,3) 



Figure 5: 5 is the original state space. ^1 and A2 are abstractions of 5. B = Ai A2 is 
the synchronized product of ^1 and A2- 



of costs in planning problems: the cost of achieving a subset of the atoms defining the goal 
is a lower bound on the cost of achieving the goal. When searching backwards from the 
goal to the start state, as Haslum et al. do, this allows an admissible heuristic to be defined 
in the following recursive minimax fashion (|t| denotes the number of atoms in state t): 



0, 



t C start 



min C{s,t) + h"^{s), \t\ < m 
(s,t)en 

max h"^{s), \t\ > m 

sGt,\s\<m, 



The first two lines of this definition are the standard method for calculating the cost of a 
least-cost path. It is the third line that uses the fact that the cost of achieving any subset 
of the atoms in t is a lower bound on the cost of achieving the entire set of atoms t. The 
recursive calculation alternates between the min and max calculation depending on the 
number of atoms in the state currently being considered in the recursive calculation, and 
is therefore different than a shortest path calculation or taking the maximum of a set of 
shortest path calculations. 



3.4.2 Previous definitions of additive abstractions 

Prieditis (1993) included a method ("Factor") in his Absolver II system for creating additive 
abstractions, but did not present any formal definitions or theory. 

The first thorough discussion of additive abstractions is due to Korf and Taylor (1996). 
They observed that the sliding tile puzzle's Manhattan Distance heuristic, and several of 
its enhancements, were the sum of the distances in a set of abstract spaces in which a small 
number of tiles were "distinguished" . As explained in Section 2.1, what allowed the abstract 
distances to be added and still be a lower bound on distances in the original space is that 
only the moves of the distinguished tiles counted towards the abstract distance and no tile 
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was distinguished in more than one abstraction. This idea was later developed in a series 
of papers (Korf &: Fclner, 2002; Fclner et al., 2004), which extended its application to other 
domains, such as the 4-peg Towers of Hanoi puzzle. 

In the planning literature, the same idea was proposed by Haslum ct al. (2005), who 
described it as partitioning the operators into disjoint sets Bi, ...Bk and counting the cost 
of operators in set Bi only in abstract space Ai. The example they give is that in the 
Blocks World operators that move block i would all be in set Bi, effectively defining a set 
of additive abstractions for the Blocks World exactly analogous to the Korf and Taylor 
abstractions that define Manhattan Distance for the sliding tile puzzle. 

Edelkamp (2001) took a different approach to defining additive abstractions for STRIPS 
planning representations. His method involves partitioning the atoms into disjoint sets 
Fi, ...Vfc such that no operator changes atoms in more than one group. If abstract space 
Ai retains only the atoms in set Vi then the operators that do not affect atoms in Vi will 
have no effect at all in abstract space Ai and will naturally have a cost of in ^j. Since 
no operator affects atoms in more than one group, no operator has a non-zero cost in more 
than one abstract space and distances in the abstract spaces can safely be added. Haslum et 
al. (2007) extended this idea to representations in which state variables could have multiple 
values. In a subsequent paper Edelkamp (2002) remarks that if there is no partitioning of 
atoms that induces a partitioning of the operators as just described, additivity could be 
"enforced" by assigning an operator a cost of zero in all but one of the abstract spaces — a 
return to the Korf and Taylor idea. 

All the methods just described might be called "all-or-nothing" methods of defining 
abstract costs, because the cost of each edge C(7r) is fully assigned as the cost of the 
corresponding abstract edge Ci{'Ki) in one of the abstractions and the corresponding edges 
in all the other abstractions are assigned a cost of zero. Any such method obviously satisfies 
property (P3) and is therefore additive. 

Our theory of additivity does not require abstract methods to be defined in an all-or- 
nothing manner, it allows C{tt) to be divided in any way whatsoever among the abstractions 
as long as property (P3) is satisfied. This possibility has been recognized in one recent pub- 
lication (Katz & Domshlak, 2007), which did not report any experimental results. This 
generalization is important because it eliminates the requirement that operators must move 
only one "tile" or change atoms /variables in one "group" , and the related requirement that 
tiles/atoms be distinguished/represented in exactly one of the abstract spaces. This re- 
quirement restricted the application of previous methods for defining additive abstractions, 
precluding their application to state spaces such as Rubik's Cube, the Pancake puzzle, and 
TopSpin. As the following sections show, with our definition, additive abstractions can be 
defined for any state space, including the three just mentioned. 

Finally, Helmert et al. (2007) showed that the synchronized product of additive abstrac- 
tions produces a heuristic hgprod that dominates haddi iii ttis sense that hsprod{s) > hadd{s) 
for all states s. This happens because the synchronized product forces the same path to be 
used in all the abstract spaces, whereas the calculation of each C* in hadd can be based on 
a different path. The discussion of the negative results and infeasibility below highlight the 
problems that can arise because each is calculated independently. 
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4. New Applications of Additive Abstractions 

This section and the next section report the results of applying the general definition of 
additive abstraction given in the previous section to three benchmark state spaces: TopSpin, 
the Pancake puzzle and Rubik's Cube. A few additional experimental results may be found 
in the previous paper by Yang, Culberson, and Holte (2007). In all our experiments all edges 
in the original state spaces have a cost of 1 and wc define Ri{'Ki) = 1 — Ci{'Ki), its maximum 
permitted value when edges cost 1. We use pattern databases to store the heuristic values. 
The pre-processing time required to compute the pattern databases is excluded from the 
times reported in the results, because the PDB needs to be calculated only once and this 
overhead is amortized over the solving of many problem instances. 

4.1 Methods for Defining Costs 

We will investigate two general methods for defining the primary cost of an abstract state 
transition Cj(7ri), which we call "cost-splitting" and "location-based" costs. To illustrate 
the generality of these methods we will define them for the two most common ways of 
representing states — as a vector of state variables, which is the method we implemented in 
our experiments, and as a set of logical atoms as in the STRIPS representation for planning 
problems. 

In a state variable representation a state t is represented by a vector of m state variables, 
each having its own domain of possible values Dj, i.e., t = {t{0), ...,t{m — 1)), where 
t{j) G Dj is the value assigned to the j*'* state variable in state t. For example, in puzzles 
such as the Pancake puzzle and the sliding tile puzzles, there is typically one variable for 
each physical location in the puzzle, and the value oit{j) indicates which "tile" is in location 
j in state t. In this case the domain for all the variables is the same. State space abstractions 
are defined by abstracting the domains. In particular, in this setting domain abstraction ipi 
will leave specific domain values unchanged (the "distinguished" values according to ipi) and 
map all the rest to the same special value, "don't care" . The abstract state corresponding 
to t according to ipi is ti={ti{0), ...,ti{m — 1)) with ti{j) = il)i{t{j)). As in previous research 
with these state spaces a set of abstractions is defined by partitioning the domain values into 
disjoint sets Ei, ...,Ek with Ei being the set of distinguished values in abstraction i. Note 
that the theory developed in the previous section does not require the distinguished values 
in different abstractions to be mutually exclusive; it allows a value to be distinguished in 
any number of abstract spaces provided abstract costs are defined appropriately. 

As mentioned previously, in a STRIPS representation a state is represented by the 
set of logical atoms that are true in the state. A state variable representation can be 
converted to a STRIPS representation in a variety of ways, the simplest being to define an 
atom for each possible variable-value combination. If state variable j has value v in the 
state variable representation of state t then the atom variable-j-has-value-v is true in the 
STRIPS representation of t. The exact equivalent of domain abstraction can be achieved 
by defining Vi, the set of atoms to be used in abstraction i, to be all the atoms variable-j- 
has-value-v in which v G E^, the set of distinguished values for domain abstraction i. 
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4.1.1 Cost-splitting 

In a state variable representation, the cost-splitting method of defining primary costs works 
as follows. A state transition vr that changes b'^ state variables has its cost, C{tt), split 
among the corresponding abstract state transitions vri , . . . , tt^ in proportion to the number 
of distinguished values they assign to the variables, i.e., in abstraction i 

ru\ * ^^""^ 



6^ 

if TT changes variables and bj of them are assigned distinguished values by TTj.^ For 
example, the 3x3x3 Rubik's cube is composed of twenty little moveable "cubies" and 
each operator moves eight cubies, four corner cubies and four edge cubies. Hence b'^ = 8 
for all state transitions vr. If a particular state transition moves three cubies that are 
distinguished according to abstraction ipi, the corresponding abstract state transition, vTi, 
would cost |. Strictly speaking, we require abstract edge costs to be integers, so the 
fractional edge costs produced by cost-splitting must be scaled appropriately to become 
integers. Our implementation of cost-splitting actually does this scaling but it will simplify 
our presentation of cost-splitting to talk of the edge costs as if they were fractional. 

If each domain value is distinguished in at most one abstraction {e.g. if the abstractions 
arc defined by partitioning the domain values) cost-splitting produces additive abstractions, 
i.e., C{tt) > ^^=xCi(7rj) for all vr G 11. Because C{tt) is known to be an integer, hadd can 
be defined to be the ceiling of the sum of the abstract distances, [ Yli=i C'i(7ri)], instead of 
just the sum. 

With a STRIPS representation, cost-splitting could be defined identically, with b'^ being 
the number of atoms changed (added or deleted) by operator vr in the original space and bf 
being the number of atoms changed by the corresponding operator in abstraction i. 

4.1.2 Location-based Costs 

In a location-based cost definition for a state variable representation, a state variable loa,r 

is associated with state transition vr and vr's full cost C(7r) is assigned to abstract state 
transition vrj if tTj changes the value of variable /oCtt to a value that is distinguished according 
to tpi-^ Formally: 

r C(7r), if TTi = {t},t]), t]{loc^) / tjilocn. and 
Ci{'Ki) = < tj{loc^) is a distinguished value according to V'j- 

y 0, otherwise. 

Instead of focusing on the value that is assigned to variable loc,^, location-based costs 
can be defined equally well on the value that variable loc^ had before it was changed. In 
either case, if each domain value is distinguished in at most one abstraction location-based 



3. Because tt^ might correspond to several edges in the the original space, each with a different cost or 
moving a different set of tiles, the technically correct definition is: 

Ci{ni) = mm ' 

4. As in footnote 3, the technically correct definition has min^ C{-k) instead of C{-k). 
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costs produce additive abstractions. The name "location-based" is based on the typical 
representations used for puzzles, in which there is a state variable for each physical location 
in the puzzle. For example, in Rubik's Cube one could choose the reference variables to be 
the ones representing the two diagonally opposite corner locations in the puzzle. Note that 
each possible Rubik's cube operator changes exactly one of these locations. An abstract 
state transition would have a primary cost of 1 if the cubie it moved into one of these 
locations was a distinguished cubie in its abstraction, and a primary cost of otherwise. 

For a STRIPS representation of states, location-based costs can be defined by choosing 
an atom a in the Add list for each operator tt and assigning the full cost C (tt) to abstraction 
i if a appears in the Add list of vTj. If atoms arc partitioned so that each atom appears in 
at most one abstraction, this method will define additive costs. 

Although the cost-splitting and location-based methods for defining costs can be applied 
to a wide range of state spaces, they are not guaranteed to define heuristics that are superior 
to other heuristics for a given state space. We determined experimentally that heuristics 
based on cost-splitting substantially improve performance for sufficiently large versions of 
TopSpin and that heuristics based on location-based costs vastly improve the state of the 
art for the 17-Pancake puzzle. In our experiments additive heuristics did not improve the 
state of the art for Rubik's Cube. The following subsections describe the positive results in 
detail. The negative results are discussed in Section 5. 

4.2 TopSpin with Cost-Splitting 

In the (N, _ftr)-TopSpin puzzle (see Figure 6) there are N tiles (numbered 1, . . . , N) arranged 
on a circular track, and two physical movements are possible: (1) the entire set of tiles may 
be rotated around the track, and (2) a segment consisting of K adjacent tiles in the track 
may be reversed. As in previous formulations of this puzzle as a state space (Felner, Zahavi, 
Schaeffer, & Holte, 2005; Holte et al., 2005; Hohe, Newton, Felner, Meshulam, & Furcy, 
2004), we do not represent the first physical movement as an operator, but instead designate 
one of the tiles (tile 1) as a reference tile with the goal being to get the other tiles in increasing 
order starting from this tile (regardless of its position). The state space therefore has A'" 
operators (numbered 1, . . . , N), with operator a reversing the segment of length K starting 
at position a relative to the current position of tile 1 . For certain combinations of N and K 
all possible permutations can be generated from the standard goal state by these operators, 
but in general the space consists of connected components and so not all states are reachable 
(Chen &: Skiena, 1996). In the experiments in this section, K = 4 and N is varied. 

The sets of abstractions used in these experiments are described using a tuple written 
as ai-a2~- ■ --om, indicating that the set contains M abstractions, with tiles 1 . . . (ai) dis- 
tinguished in the first abstraction, tiles (ai -|- 1) . . . (oi -|- 02) distinguished in the second 
abstraction, and so on. For example, 6-6-6 denotes a set of three abstractions in which the 
distinguished tiles are (1 ... 6), (7 . . . 12), and (13 . . . 18) respectively. 

The experiments compare hadd^ the additive use of a set of abstractions, with hmax-, the 
standard use of the same abstractions, in which, as described in Section 2, the full cost of 
each state transition is counted in each abstraction and the heuristic returns the maximum 
distance to goal returned by the different abstractions. Cost-splitting is used to define 
operator costs in the abstract spaces for hadd- Because AT = 4, each operator moves 4 tiles. 
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Figure 6: The TopSpin puzzle. 



If bi of these are distinguished tiles when operator op is applied to state Sj in abstraction i, 
applying op to Sj has a primary cost of ^ in abstraction i. 

In these experiments the heuristic defined by each abstraction is stored in a pattern 
database (PDB). Each abstraction would normally be used to define its own PDB, so that 
a set of M abstractions would require M PDBs. However, for TopSpin, if two (or more) 
abstractions have the same number of distinguished tiles and the distinguished tiles arc all 
adjacent, one PDB can be used for all of them by suitably renaming the tiles before doing 
the PDB lookup. For the 6-6-6 abstractions, for example, only one PDB is needed, but 
three lookups would be done in it, one for each abstraction. Because the position of tile 1 
is effectively fixed, this PDB is N times smaller than it would normally be. For example, 
with A'' = 18, the PDB for the 6-6-6 abstractions contains 17 x 16 x ... x 13 entries. The 
memory needed for each entry in the hadd PDBs is twice the memory needed for an entry 
in the hmax PDBs because of the need to represent fractional values. 

We ran experiments for the values of N and sets of abstractions shown in the first two 
columns of Table 1. Start states were generated by a random walk of 150 moves from the 
goal state. There were 1000, 50 and 20 start states for N = 12, 16 and 18, respectively. 
The average solution length for these start states is shown in the third column of Table 1. 
The average number of nodes generated and the average CPU time (in seconds) for IDA* 
to solve the given start states is shown in the Nodes and Time columns for each of hmax 
and hadd- The Nodes Ratio column gives the ratio of Nodes using hadd to Nodes using 
hmax- A ratio less than one (highlighted in bold) indicates that hadd^ the heuristic based on 
additive abstractions with cost-splitting, is superior to hmax, the standard heuristic using 
the same set of abstractions. 

When N = 12 and N = 16 the best performance is achieved by hmax based on a pair 
of abstractions each having ^ distinguished tiles. As N increases the advantage of hmax 
decreases and, when = 18, hadd outperforms hmax for all abstractions used. Moreover, 
even for the smaller values of hadd outperforms hmax when a set of four abstractions 
with ^ distinguished tiles each is used. This is important because as N increases, memory 
limitations will preclude using abstractions with y distinguished tiles and the only option 
will be to use more abstractions with fewer distinguished tiles each. The results in Table 1 
show that hadd will be the method of choice in this situation. 
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N 


Abs 


Average 
Solution 
Length 


h 

""max 


hadd based on 
cost-splitting 


Nodes 
Ratio 


Nodes 


Time 


Nodes 


Time 


12 
12 

12 


6-6 
4-4-4 

3-3-3-3 


9.138 
9.138 

9.138 


14,821 
269,974 

1,762,262 


0.05 
1.10 

8.16 


53,460 
346,446 

1,388,183 


0.16 
1.33 

6.44 


3.60 
1.28 
0.78 


16 
16 


8-8 
4-4-4-4 


14.040 
14.040 


1,361,042 
4,494,414,929 


3.42 
13,575.00 


2,137,740 
251,946,069 


4.74 
851.00 


1.57 
0.056 


18 
18 


9-9 
6-6-6 


17.000 
17.000 


38,646,344 
18,438,031,512 


165.42 
108,155.00 


21,285,298 
879,249,695 


91.76 
4,713.00 


0.55 
0.04 



Table 1: (AT, 4)-TopSpin results using cost-splitting. 



4.3 The Pancake Puzzle with Location-based Costs 

In this section, we present the experimental results on the 17-Pancake puzzle using location- 
based costs. The same notation as in the previous section is used to denote sets of abstrac- 
tions, e.g. 5-6-6 denotes a set of three abstractions, with the first having tiles (0 ... 4) as 
its distinguished tiles, the second having tiles (5 . . . 10) as its distinguished tiles, and the 
third having tiles (11 ... 16) as its distinguished tiles. Also as before, the heuristic for each 
abstraction is precomputed and stored in a pattern database (PDB). Unlike TopSpin, there 
are no symmetries in the Pancake puzzle that enable different abstractions to make use 
of the same PDB, so a set of M abstractions for the Pancake puzzle requires M different 
PDBs. 

Additive abstractions are defined using the location-based method with just one ref- 
erence location, the leftmost position. This position was chosen because the tile in this 
position changes whenever any operator is applied to any state in the original state space. 
This means that every edge cost in the original space will be fully counted in some abstract 
space as long as each tile is a distinguished tile in some abstraction. As before, we use hadd 
to denote the heuristic defined by adding the values returned by the individual additive 
abstractions. 

Our first experiment compares IDA* using hadd with the best results known for the 
17-Pancake puzzle (Zahavi, Felner, Holte, & Schacffcr, 2006) (shown in Table 2), which 
were obtained using a single abstraction having the rightmost seven tiles (10-16) as its 
distinguished tiles and an advanced search technique called Dual IDA* (DIDA*).^ DID A* 
is an extension of IDA* that exploits the fact that, when states are permutations of tiles 
as in the Pancake puzzle, each state s has an easily computable "dual state" s'^ with the 
special property that inverses of paths from s to the goal are paths from s'^ to the goal. If 
paths and their inverses cost the same, DIDA* defines the heuristic value for state s as the 
maximum of h{s) and h(s'^), and sometimes will decide to search for a least-cost path from 
s"^ to goal when it is looking for a path from s to goal. 

5. In particular, DIDA* with the "jump if larger" (JIL) pohcy and the bidirectional pathmax method 

(BPMX) to propagate the inconsistent heuristic values that arise during dual search. Zahavi et al. 
(2006) provided more details. BPMX was first introduced by Felner et al. (2005). 
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The results of this experiment are shown in the top three rows of Table 3. The Algo- 
rithm column indicates the heuristic search algorithm. The Abs column shows the set of 
abstractions used to generate heuristics. The Nodes column shows the average number of 
nodes generated in solving 1000 randomly generated start states. These start states have 
an average solution length of 15.77. The Time column gives the average number of CPU 
seconds needed to solve these start states on an AMD Athlon(tm) 64 Processor 3700+ with 
2.4 GHz clock rate and 1GB memory. The Memory column indicates the total size of each 
set of PDBs. 



N 


Algorithm 


Abs 


Average 
Solution 
Length 


h based on 
a single large PDB 


Nodes 


Time 


Memory 


17 


DIDA* 


rightmost- 7 


15.77 


124,198,462 


37.713 


98,017,920 



Table 2: The best results known for the 17-Pancake puzzle (Zahavi et al., 2006), which were 

obtained using a single abstraction having the rightmost seven tiles (10 — 16) as its 
distinguished tiles and an advanced search technique called Dual IDA* (DIDA*). 



N 


Algorithm 


Abs 


Average 
Solution 

Length 


hadd based on 
Locatkm-basod Cost s 


Nodes 


Time 


Memory 


17 
17 
17 


IDA* 
IDA* 
IDA* 


4-4-4-5 
5-6-6 
3-7-7 


15.77 
15.77 
15.77 


14,610,039 
1,064,108 
1,061,383 


4.302 
0.342 
0.383 


913,920 
18,564,000 
196,039,920 


17 


DIDA* 


4-4-4-5 


15.77 


368,925 


0.195 


913,920 


17 


DIDA* 


5-6-6 


15.77 


44,618 


0.028 


18,564,000 


17 


DIDA* 


3-7-7 


15.77 


37,155 


0.026 


196,039,920 



Table 3: 17-Pancake puzzle results using hadd based on location-based costs. 

Clearly, the use of hadd based on location-based costs results in a very significant re- 
duction in nodes generated compared to using a single large PDB, even when the latter 
has the advantage of being used by a more sophisticated search algorithm. Note that the 
total memory needed for the 4-4-4-5 PDBs is only one percent of the memory needed for 
the rightmost-7 PDB, and yet IDA* with 4-4-4-5 generates 8.5 times fewer nodes than 
DIDA* with the rightmost-7 PDB. Getting excellent search performance from a very small 
PDB is especially important in situations where the cost of computing the PDBs must be 
taken into account in addition to the cost of problem-solving (Holte et al., 2005). 

The memory requirements increase significantly when abstractions contain more distin- 
guished tiles, but in this experiment the improvement of the running time docs not increase 
accordingly. For example, the 3-7-7 PDBs use ten times more memory than the 5-6-6 PDBs, 
but the running time is almost the same. This is because the 5-6-6 PDBs are so accurate 
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there is little room to improve them. The average heuristic value on the start states using 
the 5-6-6 PDBs is 13.594, only 2.2 less than the actual average solution length. The average 
heuristic value using the 3-7-7 PDBs is only slightly higher (13.628). 

The last three rows in Table 3 show the results when hadd with location-based costs 
is used in conjunction with DIDA*. These results show that combining our additive ab- 
stractions with state-of-the-art search techniques results in further significant reductions in 
nodes generated and CPU time. For example, the 5-6-6 PDBs use only 1 /5 of the memory 
of the rightmost-? PDB but reduce the number of nodes generated by DIDA* by a factor 
of 2783 and the CPU time by a factor of 1347. 

To compare hadd to hmax we ran plain IDA* with hmax on the same 1000 start states, 
with a time limit for each start state ten times greater than the time needed to solve 
the start state using hadd- With this time limit only 63 of the 1000 start states could 
be solved with hmax using the 3-7-7 abstraction, only 5 could be solved with hmax using 
the 5-6-6 abstraction, and only 3 could be solved with hmax using the 4-4-4-5 abstraction. 
To determine if hadd^ superiority over hmax for location-based costs on this puzzle could 
have been predicted using Lemma 3.6, we generated 100 million random 17-Pancake puzzle 
states and tested how many satisfied the requirements of Lemma 3.6. Over 98% of the 
states satisfied those requirements for the 3-7-7 abstraction, and over 99.8% of the states 
satisfied its requirements for the 5-6-6 and 4-4-4-5 abstractions. 



5. Negative Results 

Not all of our experiments yielded positive results. Here we explore some trials where our 
additive approaches did not perform as well. By examining some of these cases closely, we 
shed light on the conditions which might indicate when these approaches will be useful. 



5.1 TopSpin vi^ith Location-Based Costs 



In this experiment, we used the 6-6-6 abstraction of (18, 4)-TopSpin as in Section 4.2 but 
with location-based costs instead of cost-splitting. The primary cost of operator a, the 
operator that reverses the segment consisting of locations a to a -|- 3 {modulo 18), is 1 
in abstract space i if the tile in location a before the operator is applied is distinguished 
according to abstraction ipi and otherwise. 

This definition of costs was disastrous, resulting in C*{ti,gi) = for all abstract states 
in all abstractions. In other words, in finding a least-cost path it was never necessary to 
use operator a when there was a distinguished tile in location a. It was always possible 
to move towards the goal by applying another operator, a', with a primary cost of 0. To 

of (7, 4^)-TopSpin. 






4 


5 


6 


CO 


2 


1 



illustrate how this is possible, consider state 
This state can be transformed into the goal in a single iiio\'t} 
four tiles starting with tile 3 produces the state 



CO 


4 


5 


6 





1 


2 



lie opt^rator that reverses the 
which is equal to 

the goal state when it is cyclically shifted to put into the leftmost position. With the 4-3 
abstraction this move has a primary cost of in the abstract space based on tiles 4. ..6, but 
it would have a primary cost of 1 in the abstract space based on tiles 0...3 (because tile 3 
is in the leftmost location changed by the operator). However the following sequence maps 
tiles 0...3 to their goal locations and has a primary cost of in this abstract space (because 
a "don't care" tile is always moved from the reference location): 
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* 


* 


* 


3 


2 


1 





* 


* 


1 


2 


3 


* 





* 


3 


2 


1 


* 


* 







1 


2 


3 


* 


* 


* 



5.2 Rubik's Cube 

The success of cost-splitting on (18,4)-TopSpin suggested it might also provide an im- 
proved heuristic for Rubik's Cube, which can be viewed as a 3-dimensional version of 
(20,8)-TopSpin. We used the standard method of partitioning the cubies to create three 
abstractions, one based on the 8 corner cubies, and the others based on 6 edge cubies each. 
The standard heuristic based on this partitioning, hmax, expanded approximately three 
times fewer nodes than hadd based on this partitioning and primary costs defined by cost- 
splitting. The result was similar whether the 24 symmetries of Rubik's Cube were used to 
define multiple heuristic lookups or not. 

We believe the reason for cost-splitting working well for (18,4)-TopSpin but not Rubik's 
Cube is that an operator in Rubik's Cube moves more cubies than the number of tiles 
moved by an operator in (18,4)-TopSpin. To test if operators moving more tiles reduces the 
effectiveness of cost-splitting we solved 1000 instances of (12,i^)-TopSpin for various values 
of K, all using the 3-3-3-3 abstraction. The results are shown in Table 4. The Nodes Ratio 
column gives the ratio of Nodes using hadd to Nodes using hmax- A ratio less than one 
(highlighted in bold) indicates that hadd is superior to hmax- The results clearly show that 
hadd based on cost-splitting is superior to hmax for small K and steadily loses its advantage 
as K increases. The same phenomenon can also be seen in Table 1, where increasing A'^ 
relative to K increases the effectiveness of additive heuristics based on cost-splitting. 





hmax 


hadd based on 




K 


cost-splitting 


Nodes 




Nodes 


Time 


Nodes 


Time 


Ratio 


3 


486,515 


2.206 


207,479 


0.952 


0.42 


4 


1,762,262 


8.164 


1,388,183 


6.437 


0.78 


5 


8,978 


0.043 


20,096 


0.095 


2.23 


6 


193,335,181 


901.000 


2,459,204,715 


11,457.000 


12.72 



Table 4: (12, i^)-TopSpin results using cost-splitting. 



We also investigated location-based costs for Rubik's Cube. The cubies were partitioned 
into four groups, each containing three edge cubies and two corner cubies, and an abstraction 
was defined using each group. Two diagonally opposite corner positions were used as the 
reference locations (as noted above, each Rubik's Cube operator changes exactly one of these 
locations). The resulting hadd heuristic was so weak we could not solve random instances 
of the puzzle with it. 
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5.3 The Pancake Puzzle with Cost-SpUtting 

Table 5 compares hadd and hmax on the 13-Pancake puzzle when costs are defined using 
cost-splitting. The memory is greater for hadd than hmax because the fractional entries that 
cost-splitting produces require more bits per entry than the small integer values stored in 
the hmax PDB. In terms of both run-time and number of nodes generated, hadd is inferior 
to hmax for these costs, the opposite of what was seen in Section 4.3 using location-based 
costs. 



N 


Abs 


Average 
Solution 
Length 


hmax 


hadd based on 
costing-splitting 


Nodes 


Time 


Nodes 


Time 


13 


6-7 


11.791 


166,479 


0.0466 


1,218,903 


0.3622 



Table 5: hadd vs. hmax on the 13-Pancake puzzle. 



Cost-splitting, as we have defined it for the Pancake puzzle, adversely affects hadd be- 
cause it enables each individual abstraction to get artificially low estimates of the cost of 
solving its distinguished tiles by increasing the number of "don't care" tiles that are moved. 
For example, with cost-splitting the least-cost sequence of operators to get tile "0" into 
its goal position from abstract state * Q * * * is not the obvious single move of 
reversing the first two positions. That move costs ^, whereas the 2-move sequence that 
reverses the entire state and then reverses the first four positions costs only i + 



4- 



As a specific example, consider state 



7 


4 


5 


6 


3 


8 





10 


9 


2 


1 


11 



of 



the 12-Pancake puzzle. Using the 6-6 abstractions, the minimum number of moves to get 
tiles 0-5 into their goal positions is 8, and for 6-11 it is 7, where in each case we ignore the 
final locations of the other tiles. Thus, hmax is 8. By contrast, hadd is 6.918, which is less 
than even the smaller of the two numbers used to define hmax- The two move sequences 
whose costs are added to compute hadd ^oi this state each have slightly more moves than 
the corresponding sequences on which hm,ax is based (10 and 9 compared to 8 and 7), but 
involve more than twice as many "don't care" tiles (45 and 44 compared to 11 and 17) and 
so are less costly. 

There is hope that this pathological situation can be detected, at least sometimes, by 
inspecting the residual costs. If the residual costs are defined to be complementary to the 
primary costs {i.e. Ri{'Ki) = C{'k) — Cj(7rj)), as wc have done, then decreasing the primary 
cost increases the residual cost. If the residual cost is sufficiently large in one of the abstract 
spaces the conditions of Lemma 3.10 will be satisfied, signalling that the value returned by 
hadd is provably too low. This is the subject of the next section, on "infeasibility" . 



6. Infeasible Heuristic Values 

This section describes a way to increase the heuristic values defined by additive abstractions 
in some circumstances. The key to the approach is to identify "infeasible" values — ones that 
cannot possibly be the optimal solution cost. Once identified the infeasible values can be 
increased to give a better estimate of the solution cost. An example of infeasibility occurs 
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with the Manhattan Distance [MD) heuristic for the shding tile puzzle. It is well-known 
that the parity of MD{t) is the same as the parity of the optimal solution cost for state t. If 
some other heuristic for the sliding tile puzzle returns a value of the opposite parity, it can 
safely be increased until it has the correct parity. This example relies on specific properties 
of the MD heuristic and the puzzle. Lemma 3.10 gives a problem-independent method for 
testing infeasibility, and that is what we will use. 

To illustrate how infeasibility can be detected using Lemma 3.10 consider the example 
in Figure 3. The solution to the abstract problem shown in the middle part of the figure 
requires 9 distinguished moves, so Cf (^i) = 9. The abstract paths that solve the problem 
with 9 distinguished moves require, at a minimum, 9 "don't care" moves, so R\{Ai) = 9. 
A similar calculation for the abstract space on the right of the figure yields (72(^2) = 5 
and R*2{A2) = 7. The value of haddiAg) is therefore Cl(Ai) + C^l^a) = 9 + 5 = U. 
This value is based on the assumption that there is a path in the original space that makes 
Ci{Ai) = 9 moves of tiles 1, 3, 5, and 7, and (7|(>l2) = 5 moves of the other tiles. However, 
the value of Rl{Ai) tells us that any path that uses only 9 moves of tiles 1, 3, 5, and 7 to 
put them into their goal locations must make at least 9 moves of the other tiles, it cannot 
possibly make just 5 moves. Therefore there does not exist a solution costing as little as 
CUAi) + C*2{A2) = U. 

To illustrate the potential of this method for improving additive heuristics, Table 6 
shows the average results of IDA* solving 1000 test instances of the 15-puzzle using two 
different tile partitionings (shown in Figure 7) and costs defined by the method described 
in Section 2.1. These additive heuristics have the same parity property as Manhattan 
Distance, so when infeasibility is detected 2 can be added to the value. The hadd columns 
show the average heuristic value of the 1000 start states. As can be seen infeasibility 
checking increases the initial heuristic value by over 0.5 and reduces the number of nodes 
generated and the CPU time by over a factor of 2. However, there is a space penalty for 
this improvement, because the R* values must be stored in the pattern database in addition 
to the normal C* values. This doubles the amount of memory required, and it is not clear 
if storing R* is the best way to use this extra memory. This experiment merely shows that 
infeasibility checking is one way to use extra memory to speed up search for some problems. 
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Figure 7: Different tile partitionings for the 15-puzzle (left: 5-5-5; right: 6-6-3). 

Slightly stronger results were obtained for the {N, 4)-TopSpin puzzle with costs defined 
by cost-splitting, as described in Section 4.2. The "No Infeasibility Check" columns in Table 
7 are the same as the "/iadd based on cost-splitting" columns of the corresponding rows in 
Table 1. Comparing these to the "Infeasibility Check" columns shows that in most cases 
infeasibility checking reduces the number of nodes generated and the CPU time by roughly 
a factor of 2. 
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iV 


Abs 


Average 
Solution 
Length 


hadd based on zero-one cost-splitting 


No Infeasibility Check 


Infeasibility Check 


hadd 


Nodes 


Time 


hadd 


Nodes 


Time 


15 
15 


5- 5-5 

6- 6-3 


52.522 
52.522 


41.56 
42.13 


3,186,654 
1,858,899 


0.642 
0.379 


42.10 
42.78 


1,453,358 
784,145 


0.312 
0.171 



Table 6: The effect of infeasibility checking on the 15-puzzle. 



When location-based costs are used with TopSpin infeasibility checking adds one to the 
heuristic value of almost every state. However, this simply means that most states have a 
heuristic value of 1 instead of (recall the discussion in Section 5.1), which is still a very 
poor heuristic. 







Average 


hadd 


based on costing-splittin 


g 


N 


Abs 


Solution 


No Infeasibility Check 


Infeasibility Check 






Length 


Nodes 


Time 


Nodes 


Time 


12 


6-6 


9.138 


53,460 


0.16 


20,229 


0.07 


12 


4-4-4 


9.138 


346,446 


1.33 


174,293 


0.62 


12 


3-3-3-3 


9.138 


1,388,183 


6.44 


1,078,853 


4.90 


16 


8-8 


14.040 


2,137,740 


4.74 


705,790 


1.80 


16 


4-4-4-4 


14.040 


251,946,069 


851.00 


203,213,736 


772.04 


18 


6-6-6 


17.000 


879,249,695 


4,713.00 


508,851,444 


2,846.52 



Table 7: The effect of infeasibility checking on {N, 4)-TopSpin using cost-splitting. 



Infeasibility checking produces almost no benefit for the 17-Pancake puzzle with location- 
based costs because the conditions of Lemma 3.10 are almost never satisfied. The exper- 
iment discussed at the end of Section 4.3 showed that fewer than 2% of the states satisfy 
the conditions of Lemma 3.10 for the 3-7-7 abstraction, and fewer than 0.2% of the states 
satisfy the conditions of Lemma 3.10 for the 5-6-6 and 4-4-4-5 abstractions. 

Infeasibility checking for the 13-Pancake puzzle with cost-splitting also produces very 
little benefit, but for a different reason. For example. Table 8 shows the effect of infeasibility 

checking on the 13-Pancake puzzle; the results shown are averages over 1000 start states. 
Cost-splitting in this state space produces fractional edge costs that are multiples of ^eoseo 
(360360 is the Least Common Multiple of the integers from 1 to 13), and therefore if 
infeasibility is detected the amount added is ^qq^qq ■ But recall that hadd, with cost-splitting, 
is defined as the ceiling of Xli=i C'i(^i)- The value of hadd will therefore be the same, whether 
3gQ3gQ is added or not, unless the sum of the Ci^iTi) is exactly an integer. As Table 8 shows, 
this does happen but only rarely. 
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N 


Abs 


Average 
Solution 
Length 


hadd based on costing-splitting 


No Infeasibility Check 


Infeasibility Check 


Nodes 


Time 


Nodes 


Time 


13 


6-7 


11.791 


1,218,903 


0.3622 


1,218,789 


0.4453 



Table 8: The effect of infeasibility checking on the 13-Pancake puzzle using cost-splitting. 



7. Conclusions 

In this paper we have presented a formal, general definition of additive abstractions that 
removes the restrictions of most previous definitions, thereby enabling additive abstrac- 
tions to be defined for any state space. We have proven that heuristics based on additive 
abstractions are consistent as well as admissible. Our definition formalizes the intuitive 
idea that abstractions will be additive provided the cost of each operator is divided among 
the abstract spaces, and we have presented two specific, practical methods for defining ab- 
stract costs, cost-splitting and location-based costs. These methods were applied to three 
standard state spaces that did not have additive abstractions according to previous def- 
initions: TopSpin, Rubik's Cube, and the Pancake puzzle. Additive abstractions using 
cost-splitting reduce search time substantially for (18,4)-TopSpin and additive abstractions 
using location-based costs reduce search time for the 17-Pancake puzzle by three orders of 
magnitude over the state of the art. We also report negative results, for example on Rubik's 
Cube, demonstrating that additive abstractions are not always superior to the standard, 
maximum-based method for combining multiple abstractions. 

A distinctive feature of our definition is that each edge in an abstract space has two costs 
instead of just one. This was inspired by previous definitions treating "distinguished" moves 
differently than "don't care" moves in calculating least-cost abstract paths. Formalizing this 
idea with two costs per edge has enabled us to develop a way of testing if the heuristic value 
returned by additive abstractions is provably too low ( "infeasible" ) . This test produced no 
speedup when applied to the Pancake puzzle, but roughly halved the search time for the 
15-puzzle and in most of our experiments with TopSpin. 
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